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Abstract. In this paper, we review a method for computing and parameterizing the set of homotopy 
classes of chain maps between two chain complexes. This is then applied to finding topologically meaningful 

_n maps between simplicial complexes, which in the context of topological data analysis, can be viewed as an 

_^ extension of conventional unsupervised learning methods to simplicial complexes. 

o 

^P 1. Introduction 

P 

One goal of topological data analysis is to adapt algebraic topological methods to the context of point cloud 

data (i.e. finite metric spaces). The generalization of homology to this setting is called persistent homology 

[ZC05J . |Car09] . Persistent homology has been successful at providing insight into nonlinear datasets, that 

would not be accessible with more classical methods. Although maps between spaces play a fundamental 

role in algebraic topology, most of the developments in topological data analysis have focused on the spaces 

and datasets themselves. In this paper, we propose a method for studying these maps. 

Suppose that we compute the homology of a space X. By comparing the homology of X with the known 
homology of other spaces, it can suggest that X is homeomorphic to a previously understood space, or that 
there should be maps exhibiting prescribed homological behavior to model spaces with known homology. 
One thinks of this process as a kind of non-linear coordinatization. Standard methods for introducing useful 
rvQ coordinates on a point cloud include Principal Component Analysis and Multidimensional Scaling. Both of 

^ these methods often work well when the structure of the space is essentially Euclidean. However, when the 

space carries noncontractible topology, as in the case of a circle, these methods do not provide a method for 
mapping the data set to a nonlinear model. We describe some examples when the ability to construct maps 
to nonlinear targets would be useful. 

Circular coordinatization: In situations where one finds that the persistent homology of the data set 
is that of a circle, it is natural to attempt to find a map to the circle. In this special case, there is a natural 
methodology using the persistent cohomology generator in dimension 1 which allows one to construct the 
map. The procedure is described in detail in |dSVJ09j . 

Natural image example: In CISZ06 , homological calculations where carried out to confirm that a 
space of frequently occurring motifs within 3x3 image patches in natural images had the homology of a 
Klein bottle. Once one is given the homology, it is then of a great deal of interest to attempt to construct 
an actual parametrization by a Klein bottle. This was done by hand in [CISZ06J, but one would like to 
automate the procedure. 

Gene expression data: Gene expression microarrays provide a powerful tool for obtaining information 
about many biological phenomena, including cancer. They produce high dimensional data, where the coordi- 
nates consist of probes representing particular genes. It is very well known that the results of such studies are 
highly dependent on platforms, procedures within the laboratories performing the studies, as well as many 
other factors. All this can distort the geometry of the data set, but one can hope that certain topological 
features would still be preserved, which might permit one to map one data set to another in a nonlinear way, 
preserving the relevant features. Often [NLC10] . the geometry of these data sets are represented by shapes 
like the one pictured below. 
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In this case, since the tree is contractible, the direct use of homology will not be useful, since homology 
vanishes on contractible spaces. However, relative homology of the pair (X, dX), where dX denotes a 
suitably defined boundary of the space, does capture the existence of branches or flares in the geometric 
object. There are reasonable ways of defining the boundary of the metric space, and the points in the 
boundary have significance in that they tend to consist of most representative phenomena of a particular 
subclass of samples. For example, type I and type II diabetes can be distinguished in this way, as can 
various molecular subtypes of cancer. In this case, one could then fix the map on the boundary, and study 
the relative mapping problem in which one enforces the constraint that the boundary is carried into a small 
neighborhood of the boundary. This kind of mapping is of a great deal of importance, since the problem of 
reconciling different data sets constructed on the same kinds of tumors or other disorders is a major problem 
in this area. 

From the perspective of topological data analysis, we desire a map between two simplicial complexes X 
and Y to satisfy the following: 

• Such a map must be functorial: It must induce homomorphisms on the homology of X and Y. 

• Ideally, the map would be simplicial. This means that the image of a simplex should be a chain in 
Y containing either or 1 simplices. Simplicial maps are particularly very well behaved: they are 
determined by their values on vertices and can be fully extended by linear interpolation. Additionally, 
they have a nice geometric interpretation. In general, we obtain maps between chains in X and Y . 
Thus, the typical situation is that we have f{a) = Sr-ey |r-|=|<r| a J T J- 

• Even though a map might be a chain map (hence inducing homomorphisms, or even isomorphisms 
in homology), such a map might be unsatisfactory in the eyes of a practitioner. We want these 
mappings to reveal some sort of information about the structure of one complex in terms of the 
other. A common situation might be that X is created from a large and high-dimensional dataset, 
and Y is a simple model with the same homology. In this case, a map X — > Y can be thought of 
as performing a kind of topological dimensionality reduction. This leads to the process of geometric 
regularization through optimization. 

Unfortunately, simplicial maps do not always exist - an example is the case of mapping from a triangle to 
a square. For this reason, we wish to find maps that are as close to simplicial as possible. Our investigation 
to this problem proceeds as follows: 

• The first step is to compute a compute a parameterization of the homotopy classes of chain maps 
between two finite simplicial complexes. This is accomplished quite easily using a simple trick from 
homological algebra. (See sections [2] and [3]) 

• We wish to develop optimization problems that yield maps that are as close to simplicial as possible. 
Additionally, we would like these problems to satisfy other properties. For example, when they exist, 
simplicial maps should be contained in the set of optima, and to preserve tractability the problem 
should be convex. We discuss these criteria in section HI and formulate two different optimization 
problems over the parameterization. The first one is convex and can be formulated as a linear 
program, while the second is non-convex but has the property that its minima are precisely the set 
of simplicial maps when they exist. 

• In section[5j we discuss three applications to various scenarios in topological data analysis: manifold- 
valued coordinates, density maximization, and mappings of contractible spaces. In the third example, 
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we show how to compute mappings between trees by considering their relative homology with respect 
to a boundary defined by a filter function. 

The two main existing investigations into the area of computing maps are |dSVJ09] and |Din09| . In 
the paper dSVJ09 , de Silva and Vej demo- Johansson present a method based on persistent cohomology for 
computing circular-valued coordinates on statistical data sets. The fundamental idea behind their work is 
to use the Brown representation of cohomology. Since we know that S 1 is the Eilenberg-MacLane space for 
the group Z, we have that 

H 1 (X;Z) = [X, K(Z, 1)] = [X, S 1 } 

We can compute a map from a space X to the circle S 1 by choosing a representative from a cohomology 
class ^(X; Z). In practice, they also perform an optimization step where they select the smoothest cocycle. 
Although this works very well for the case of a circle, it is not practically generalizable to spaces other than 
S 1 . For example K(Z, 2) is the infinite-dimensional complex projective space CP°°, and K (Z/2Z, 1) is MP 00 . 

In the PhD thesis of Yi Ding, |Din09j , the mapping problem is investigated from a combinatorial perspec- 
tive. The hom-complex is used, as it is here, to obtain a parameterization of the space of chain maps, but 
combinatorial optimization techniques are used to select maps that satisfy certain criteria. Our work can be 
seen as an extension of Ding's work to the continuous case. 

We close the introduction with a remark on how we can think about the simplicial mapping problem as 
a version of higher order clustering. Clustering can be thought of as computing a map from a dataset X to 
the discrete space Y = {1, ..., fc}. In the figure below, a cluster assignment is a mapping from each point in 
the set on the left to the set {1, 2, 3}. 
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Suppose that we construct a filtered simplicial complex from X with some maximum filtration parameter 
r max- Examples of such constructions include the Vietoris-Rips complex, the Cech complex, and others. In 
this case, a clustering assignment is a mapping from the filtered complex to {1, ..., k} that is constant on the 
path components of the complex. It is easy to verify that this is equivalent to the chain map property on 
dimension 0: If e is an edge between u and v in the same path component, then f(u)—f(v) = f(de) = df(e) ~ 
0. Thus, a higher dimensional analogue of clustering is the computation of some homotopy representative of 
a class of chain maps for n-simplices, with n > 0. 

2. Definitions and Basics 

In this section we review some basic definitions for completeness. This material can be found in a standard 
text on algebraic topology such as [Hat02j or Mun93j. The material on the hom-complexes can be found in 
[ML75J . 

Definition 1. A chain map between two chain complexes (A n ,d n ) and (B n ,d' n ) is a family of homomor- 
phisms f n : A n — ¥ B n such that for each n we have that f n -id n — d' n f n . 

Definition 2. Given two chain maps f and g between the chain complexes (A n ,d n ) and (B ni d' n ), a chain 
homotopy is a family of homomorphisms s n : A n — ¥ B n+ \ such that for each n we have that 

(1) d n+1 s n + s n _ia„ = f n — g n 

The key fact is that if the chain maps / and g are chain-homotopic, then they induce the same homo- 
morphism on homology. 
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Definition 3. Given two chain complexes (A*,d*) and (-B*,<i^), we define a new complex as follows. Let 

oo 

(2) Ham B (A.,£*) = Hom(^ p , B p+n ) 

P— — QO 

An element / of Hom n (A :t , B*) is a family of homomorphisms f p : A p — > B p+n for p £ Z. Note that even 
in the case where the chain complexes {A*,dA and (B*,^) have non- negative support, the chain complex 
Hom„ {A* , B* ) will have non-trivial negative terms in general. 

Now that we have defined the terms in the complex, we need to define connecting homomorphisms 

d" : Hom„(A*,-B*) -^ Hom n _ 1 (.A»,.B,) 

If / S Hom n ( J 4*, B*), then d^{f) will itself be a family of homomorphisms. Let us denote by f p , the 
component mappings on the individual modules A p . To define d„ (/), it is sufficient to specify its action on 
elements of A p for all p. Let a e i p , be an arbitrary element. We define 

€(f)(a) = d' p+n (f P (a)) + (-l) n+1 .f P - 1 (d p (a)) 

It is easy to see that d H satisfies d^d^ +1 — 0. We summarize the properties of the hom-complex in the 
proposition below. Each claim can be proved by a simple computation. 

Proposition 1. The hom-complex satisfies the following homological properties: 

(1) / £ Homo {A* , Bif ) is a cycle (d '(/) = 0) if and only if f is a chain map. 

(2) / £ Honio(A*, B*) is the boundary of some element s £ Honii {A* , B* ) if and only if f is chain 
homotopic to the zero map via s. 

(3) The zeroth homology group Ho{Hom i ,{A t , BA) consists of homotopy classes of chain maps. 

Thus we have a nice characterization of the homotopy classes of chain maps - all we need to do is compute 
the homology of the hom-complex. Note that in this paper, if we use the term homotopy we actually mean 
the term "chain homotopy" . In other words, we only deal with the algebraic notion and not the topological 
one. Furthermore, by a map between chain complexes, we really mean a chain map. 

Remark 1. In our investigation, we will restrict ourselves to field coefficients for two computational reasons. 
Firstly, the homology computations will be performed in a persistent setting for which one must work over a 
field (see [ZC05J. Secondly, working over a field of characteristic greatly enhances the efficiency of the later 
optimization steps. As a bonus for working with field coefficients, we get a more manageable representation 
of the individual Horn terms, since for vector spaces V and W we have that Hom(V, W) = V^ <g) W. We 
denote the vector space dual of V with V A . In this case, the n-th term in the hom-complex is 

oo 

(3) Hom n (A„B*)= {A p ) A ® B p+n 

p— — oo 

This is very similar to another standard construction in homology theory - the tensor product of two chain 
complexes. Thus an element f £ Hom„(A*, B.A may be written as f — J^ Qj-a* ®bj. From the computational 
point of view, this representation is particularly useful since most of the coefficients Cij will be zero and do 
not need to be stored. 

We now close this section with a theorem which gives a useful characterization of the homology of the 
hom-complex in special cases. A proof may be found in chapter III of (ML 75 . 

Proposition 2. Let {A n ,d n ) and {B n ,d' n ) be chain complexes of R-modules. Then there exists the following 
exact sequence for each n 

oo oo 

(4) ^ ^Ext 1 R {H k {A),H k+n+1 {B)) -> H n (Horn, {A, B)) -► ®Kom{H k {A),H k+n {B)) -+ 

— oo — OO 

In the special case when we are dealing with vector spaces over a field ¥, then the Ext term vanishes and 
we have an explicit expression for the Horn term. So we get that 

oo oo 

(5) F n (Hom*(A*, B*)) = Uom{H k {A),H k+n {Bj) = H k {A) ® H k+n {B) 



homological coordinatization 5 

3. Finding Mappings between Simplicial Complexes 

In this section we apply the algebraic techniques of the previous section to computing a parameterization 
of the homotopy classes of maps between two simplicial complexes. 

Suppose we have simplicial complexes, X and Y with X = {cr.;}, Y = {tj}. So {ct;} and {tj} are bases 
for the vector spaces (over a field, F) C*(X) and C*(Y). We also denote the dual basis of C*{X) with {a*}. 
Then, the vector space Hom(C*(X),C»(Y')) has basis {a* <g> tj}. 

Based on our previous discussion about hom-complexes, we know that the set of homotopy classes of maps 
between X and Y is given by the 0-th homology iJo(Hom sf (C*(X), C*(Y))) with 

oo 

(6) Hom„(C*(X),C*(Y)) = CV(X)®C p+n (Y) 

P— — QO 

In practice, we are not interested in the rank or Bctti numbers of the homology group, but rather we 
wish to find representatives for the homotopy classes. As described in [Mun93j . homology can be computed 
via the Smith normal form in the case of coefficients in a PID, or with Gaussian elimination in the case of 
coefficients in a field. 

The result of the homology computation is a set of representative cycles (equivalently chain maps), which 
we denote {/ m }. We can also easily obtain the set of chain homotopies to zero by computing the columns 
(the image) of the matrix representation of d^ . Let us denote these columns by {h n }. 

The general parameterization of the affine space of homotopy classes is 



[X,Y] 



^ b m f m + ^2 c„h n \b m £ F, c n £ F > 

s m n ) 



Note that the above expression uses the notation [X, Y] for the chain-homotopy classes of chain maps 
between X and Y, and not the homotopy classes of all continuous maps. Let us denote the ra-th chain map 
by fm = J2ij fij a i ® T h an d the n-th chain homotopy by h n = J2ij h^cr* <g> Tj. 

3.1. Example. Let X be a triangle with vertices [0], [1], [2], and edges [0, 1], [0, 2], [1, 2]. Let Y = X. Note 
that by the homotopy classification theorem of the previous section (Proposition [2]) , we have that 

oo 

ffo(Hom,(C,(X),C,(y))) S H k (X) <g> H k (Y) SF ®F 

— OO 

Thus we have two generating cycles. The two computed representatives of i/o(Hom n (C*(X),(7*(Y))) are: 

/o = [0] -+ [0] + [1] -+ [0] + [2] -4 [0] 

fi = -[1,2]->[0,2] + [1,2]->[0,1] + [1,2]->[1 s 2] 

The set of homotopies is given by the image of the 1-dimensional boundary matrix df . The first few (out 
of a total of 9) are: 

ho = [1, 2] -> [0, 1] + [0, 2] -»■ [0, 1] + [2] -> [1] - [2] -> [0] 
hi = [1, 2] -> [0, 2] + [0, 2] -> [0, 2] + [2] -> [2] - [2] -» [0] 
^2 = [2] -> [2] - [2] -► [1] + [1, 2] -> [1, 2] + [0, 2] ^ [1, 2] 

We can see that the first generating cycle induces an isomorphism Hq(X) — > Hq(Y), and induces the zero 
map on dimension- 1 homology. Similarly, the second generating cycle induces an isomorphism Hi(X) — > 
H\(Y) and induces the zero map on dimension-0 homology. A generator for Hq(X) (equivalently Hq(Y)) is 
[0], and a generator for H 1 (X) (equivalently Hi(Y)) is [0, 1] — [0, 2] + [1, 2]. A quick computation also reveals 
that adding a chain homotopy does not change the induced action of the generating cycles. 

This example also gives us some hints about the selection of the coefficients {b m } for the chain maps. From 
the previous paragraphs, we have chain maps (which are homology cycles), /o which induces an isomorphism 
on dimension homology, and /i which induces an isomorphism on dimension 1 homology. If we take their 
sum, / = /o + /i (by setting bo = b\ = 1), then it is easy to see that / induces an isomorphism on both 
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homology groups. In practice this map is the one to start with, since it preserves the homological structure 
across all dimensions. Thus in this case we would use the parameterization 



f + y^,c n h n \c„ G F 



4. Searching the Parameterization 

Given two simplicial complexes X and Y, we now know how to compute a parameterization for the 
homotopy classes of chain maps between them. However, in a statistical application setting we are interested 
in selecting only one geometrically meaningful map from this set. Some reasonable criteria for such a map 
are: 

• Image cardinality (simplicialness) : In general, the image of a simplex a under a map g will be 
some linear combination g(a) = ^2jdjTj. Ideally, we would want to have the number of non-zero 
coefficients Cj to be as few as possible. 

• Preimage cardinality: Likewise, we would prefer if the number of non-zero coefficients in the expres- 
sion g*(r) — J2i c i°i be as few as possible. 

• Locality: We would like the image of a simplex to be localized in the codomain. This means that 
the non-zero terms in the expression g{a) — ^\ ^j T j should not be spread apart. 

• Unbiasedness: There should be no a-priori reason to prefer one map over another if they achieve the 
same optima. For example, in the case of a triangle mapping to a triangle, there is no geometric way 
of distinguishing the identity map from one of the two rotations of the vertices (both are simplicial 
and optimally localized). 

• Convexity: The optimization problem should be convex. 

Unfortunately, the above criteria are mutually incompatible. To see this, it suffices to consider the case 
where X and Y are both triangles. The optimally sparse and localized chain maps include the three rotations 
of the vertices. However, the unbiased property says that each of these three maps should be elements of 
the set of optima of the optimization problem. If we require the problem to be convex, then it turns out 
that the set of optima must also be convex and in particular connected. Thus if one takes a non-trivial 
convex combination (say with coefficients (1/3, 1/3, 1/3)), that will also be an optima but it will violate 
the condition of sparsity. One remedy for this is to require that the set of optima is the convex hull of the 
unbiased set of points. Alternatively, one can discard unbiasedness and require that only one sparse point 
be returned. 

4.1. The Combinatorial Approach is Hard: Theory. The purpose of this section is to provide some 
orientation regarding the complexity of finding nice maps. This assumes that we are taking a combinatorial 
approach as discussed in Din09j. 

Our ultimate goal would be to construct a map that is simplicial in both directions. This means that the 
image and preimage of each simplex contains zero or one simplices. Let us call such a map bisimplicial. In 
other words, such a map would minimize the quantity 

(7) max||/(a)|| +max||r(r)|| 

When a simplicial map exists, the above function has a minimum value of 2. However, it turns out that the 
above optimization problem is hard in a precise sense: 

Proposition 3. Finding whether or not a bisimplicial map exists is at least as difficult as solving the graph 
isomorphism problem. 

Proof. We use the standard technique of performing a polynomial-time reduction of an instance of the graph 
isomorphism problem to the bisimpliciality problem. In other words, suppose that we have an oracle that 
can tell us whether a bisimplicial map between two complexes exists in polynomial time. Then we must 
show that we can also determine whether there exists a graph isomorphism between two graphs G and H in 
polynomial time. Let us construct a machine that solves this problem in polynomial time. 

Suppose that we are given two graphs G and H. Note that we can take care of any polynomial time 
graph invariants beforehand. For example this means that we may answer "No" if G and H have different 
numbers of vertices or edges. Similarly, since homology over a field may be computed in polynomial time 
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(sec [ZC05 ), we may also answer "No" if H*(G) ^ H*(H). Thus suppose that using our oracle we have 
constructed a bisimplicial map / that induces an isomorphism on homology between C*(G) and C*(H). 

We claim that / is a graph isomorphism. Let us denote the vertices of G and H by V(G) and V(H), 
and the edges by E(G) and E{H). By bisimpliciality, / must be an isomorphism between V(G) and V(H). 
Suppose that u ~ v in G (ie. the edge [u,v] exists in G). Then by the fact that / is a chain map, 
df([u,v]) — f(d[u,v]) — f(v — u) = f(v) — f(u). Thus f([u,v]) is an edge between f(u) and f(v) in H. So 

Conversely, suppose that there are vertices x, y m H such that x ~ y and /(u) = a; and /(f) = y, but 
with u 00 v. However, we must have an edge [s,t] G E{G) such that /([s, t]) = [x,y] (if not / would not be 
bisimplicial or would not be a homology- isomorphism) . But then at least one of the following holds: s^iior 
t 7^ v. Without loss of generality, s ^ u. Then we have that f(s) — f(u) — x, contradicting the bisimpliciality 
of /. Thus we must have that u ~ v, and hence / is a graph isomorphism. Thus, answering "Yes" when a 
bisimplicial map exists, and "No" otherwise solves the graph isomorphism decision problem. □ 

Unfortunately, it is not known whether the graph isomorphism problem is NP-complete or is in P [GJ79J . 
Thus all of the best-known algorithms are super-polynomial. Complexity theorists have created a class called 
GI which consists of those problems which are polynomial time reducible to the graph isomorphism problem. 
In this terminology, we have shown that computing bisimplicial maps is Gl-hard in general. 

4.2. The Combinatorial Approach is Hard: Empirical Findings. Although the bisimpliciality prob- 
lem is provably hard as previously shown, one may wonder whether it is possible to use heuristic combinato- 
rial optimization techniques (over Z/2Z). Examples include random walks, simulated annealing, or greedy 
search with randomized restarts. Here, we give some empirical evidence suggesting that these approaches 
arc unlikely to work for finding bisimplicial maps. Although it is possible that many simplicial maps exist 
for certain articially constructed cases, the examples below suggest that such maps are "rare" . 

Consider the one of the simplest conceivable nontrivial cases: where both X and Y are squares containing 
the simplices {[0], [1], [2], [3], [0, 1], [1, 2], [2,3], [0, 3]}. Suppose that we are looking for a map / : C*(X) — > 
C*(Y) such that / is forward and backward simplicial. As stated before we wish to find a minimizer for 
equation ([7|. 

The homotopies in this case consist of the images of all simple tensors of the form [a*] <E) [b,c] under the 
map d\. Thus there are a total of 16 homotopies. Since we are dealing with Z/2Z coefficients, there are 
2 16 = 65, 536 possible choices for the homotopy coefficients. Although it is not practical in general, we may 
simply enumerate all possible sets of coefficients. If we enumerate all possible sets of coefficients, we find 
that out of the 65,536 possibilities, only 16 choices of coefficients yield bisimplicial maps. Figure [l] shows the 
set of all coefficients enumerated on the horizontal axis (by using binary representation) with the simplicial 
objective function value on the vertical axis. 

Similarly, for finding mappings between a triangle and a square, out of the 4096 possibilities only 7 of 
them yield minima. Similarly, in the case where X and Y are circles with 8 vertices (octagons) simulated 
annealing was performed. After 25,300 iterations a minimum value of 11 was reported for the simplicial 
objective function. Since the identity map is a minimizer, the actual minimum should be 2. This was also 
the case with the other heuristics (greedy search and random walking) - we have yet to find a nontrivial 
pair in which one of these techniques is able to find a map that comes close to minimizing ¥f\ . While these 
examples do not constitute proof, they suggest that bisimplicial maps are extremely rare among chain maps. 

4.3. Matrix Representation. For convenience, we fix some notation that will be used for the rest of this 
document. Let the capital letters F m denote the matrix representations of the chain maps / m , and H m 
denote the matrix representations of the homotopies h m . We let F = J2 m ^™ be * ne sum °f ch am maps. We 
denote an arbitrary member of [X, Y] by g, and its matrix representation by G. The basis elements {<7i} of 
X correspond to the unit vectors {e^}, and the basis elements {tj} of Y correspond to the unit vectors {e^}. 

4.4. Minimizing Image and Preimage Size. Let us explore the first two criteria stated at the beginning 
of this section. We said that ideally we would like to minimize the number of non-zero terms in the image 
and adjoint-image of each simplex. This can be formulated as a special case of the following optimization 
problem: 
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Figure 1 . Enumeration of all of the homotopy representatives of the chain map inducing an 
isomorphism between two squares. The horizontal axis indicates the binary representation 
of the set of 16 coefficients for the homotopies, and the vertical axis shows the bisimpliciality 
penalty. 
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max 
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subject to g = ^ b ?nfn 



Note that in the above expression, the coefficients {b m } are fixed beforehand. This means that we make 
an initial selection of which homological features to preserve. If they were not selected and were optimization 
variables, the problem above would be minimized by the zero map. 

It is also possible to replace the max terms by sums over the domain and codomain, however such a 
replacement yields less meaningful maps (Consider the case of the chain map which maps each vertex in the 
domain to a single vertex in the codomain). 

To minimize the maximum cardinality or the sum of the cardinalities, one can use p = (although this 
is not actually a norm). However, we have seen that such a combinatorial approach is difficult, and since 
the problem dimension we are dealing with is multiplicative in the sizes of the simplicial complexes X and 
Y, this is out of the question. As practiced in many optimization settings, one may relax the cardinality 
minimization problem to a 1-norm minimization problem. The intuition behind this is that the unit ball in 
the 1-norm is the convex hull of the points {ie^} and that constrained optima tend to lie on corner points 
which are sparse. 

In the case of the 1-norm, we can rewrite this as: 



(9) 



minimize^ ||G||i + ||G T ||i 

subject to G = ^2 b m F m + ^ c n H n 



The 1-norm of a matrix is defined to be the maximum absolute column sum of its entries, and is equal to 
the operator norm induced by the vector 1-norm. It turns out that this problem is convex and in fact can 
be reformulated as a linear program. 

Although the optimization problem ([8]) with p > 1 is useful in that it eliminates many solutions which 
we view as being inadmissible, it still has the property that it is closed under convex combinations. This 
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means, for example, that in the case of the two triangles, the map that sends each vertex to ^([0] 4- [1] + [2]) 
is an optima, but is not what we are looking for. Thus, our viewpoint changes from ([8]) being the answer 
our search for a suitable optimization problem, to instead being a definition of admissibility of a map: 

Definition 4. A map g is said to be admissible if it is a member of the set of optima of p|) with p = 1 or 
equivalently f$L). Denote the admissible set by C. 

The next step is to somehow identify the points within C that satisfy some sort of sparsity requirement. 
One possibility is to require the coefficients {c n } to be integral. This brings us into the realm of integer linear 
programming, which turns out to be NP-hard (in the absence of the property of integral vertices which is not 
satsified in this situation). In fact, the integer feasibility problem (finding a point with integer coordinates 
in a polytope defined by {x|Ae < b}) is also NP-hard. Computationally, this can be applied to situations 
where both X and Y are small complexes, but it docs not scale well. 

The second possibility is to optimize some measure of sparsity or peakiness over the points in C. One 
strategy is to maximize the ratio of the 2-norm to the 1-norm. (To see that this is reasonable, one can consider 
the vectors (1,0...0) and (1/?T, ...1/n)). Another possibility is to minimize the function which measures the 
distance of a point to it's nearest integral point (let the objective be (#i — [xi],...,x n — [&«,]). Although 
both these objectives have the property that their global minima over C are the simlicial maps, both of them 
suffer from being non-convex. In fact, since the vertices of C are local minima of these functions, global 
minimization would involve searching all of the corner points. Once again, the task of enumerating vertices 



of a convex polytope turns out to be NP-hard |KBB + 08j . A third random heuristic is to select a random 
search direction v and solve the problem 

... t 

minimize v c 
(10) subject to c € argmin max ||<7(cr)||p + max ||(7*(r)|| p 

\<t£X t£Y 

This will result in the selection of a random corner point of C. This can be repeated until a sufficiently sparse 
(close to simplicial) map is found. 

4.5. The Alexander- Whitney Map. It is easy to see that there is no convex objective function that will 
select all of the simplicial maps, since a convex function cannot have a disconnected set of minima. However, 
if we discard the requirement of unbiasedness (not favoring one simplicial map over another) or convexity 
we can devise other methods for selecting favorable maps. The following method dispenses with convexity. 
Define the Alexander- Whitney map A : C,(X) -» C*(X) <g) C*{X) by 

deg a 

A(ct) = J^ 0Jo...j ®cr|i...do gCT 
i=0 

Where er|o...i is [vq, ■■■Vi] if a — [vo, •■■, V n ], and a\i...dcgo- is [vi, ..., v n ]. It is a routine calculation to show that 
the following diagram commutes if and only if / is a simplicial chain map: 

C*{X) — ^ C*(Y) 



(11) 



A 



A 



C,(X)®C.(X) -^-4 C*(Y)®C*(Y) 



Thus we can measure the deviation of a chain map from simpliciality by measuring the norm of the 
difference g <& <7(A(er)) — A(g(c)). Suppose we are given a loss function L on IR" 7 (g) M. J , we can define 

(12) L AW (g) = Y,L(9® <7(A(<t)) - A(<7(<7))) 

aex 

For example, a convenient choice would be the quadratic loss 



(13) MHaw = J2 W9 ® 0( A (^» - A (9(o-) 



aex 
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Thus given a selection of a loss function, we can solve 

minimize Law (g) + Law {g* ) 
(14) 
v ; subject to g £ [X, Y] 

Note that even if the original loss function L is convex, Law will not be convex on the affine space of chain 
maps. However, it is clear the minima of the above optimization problem will be precisely the bisimplicial 
maps, in the case where one exists. 

Remark 2. A naive implementation of the above would be very expensive to compute. Suppose that \X\ = I 
and \Y\ = J, so the matrix representation of a map f is a J x I matrix F. At some point, we need to compute 
a product of the form (F £g) F)v. The simple way would be to form the matrix F ® F (of size J 2 x I 2 ), and 
then perform the matrix-vector multiplication. The cost of this operation is 0(I 2 J 2 ). 

However, it is possible to rearrange this product differently. Suppose that v is a vector in F 7 . We reshape 
this into the matrix V of size I x I by dividing v into blocks of length I and laying them side by side. It 
turns out that computing the product (F <E) F)v is equivalent to computing FVF T and then reshaping it into 
vector form (see [HS80 ). The complexity of this operation is 0(I 2 J). 

5. Extensions and Applications 

5.1. Coordinatization on a Manifold. In this section we investigate how the method previously described 
can be used to find manifold-valued coordinates for a given simplicial complex. Suppose that we have the 
following data: 

• X: The domain simplicial complex. This is the "original" data set under investigation. 

• M : A manifold that we wish to compute coordinates on. 

• Y: A triangulation of the manifold M. 

• ip: A homeomorphism, Y — > M corresponding to the triangulation of M. 

• ip: A "localization" function, Cq(Y) — > M which maps zero-dimensional chains on Y to points on 
the manifold. 

We enforce a compatibility constraint on functions ip and ip: If a = 'Ylii c i a ii where c% = <Jy, then 
ip(a) — tp((Tj). Our objective is to find a coordinate mapping p : X — > M. 

The first approach to computing coordinates on M is to compute the parameterization of [X, Y] as well 
as to perform one of the optimization routines in the previous section to obtain a map / : C*(X) — > C*(Y). 
Then, given a vertex a € X, we define its coordinate on M to be 

p(ct ) = t/j(f(cr )) 

A second approach relies on the geometry of the manifold. Suppose that M is equipped with a Riemannian 
metric g. We wish to find a mapping X — > M which minimizes the total distortion across the 1-skeleton of 
X. In other words, we wish that vertices connected by an edge in X should be mapped nearby in M. This 
can be precisely formulated as the following optimization problem: 

minimize ^ d 2 (7p(f(a i )),tp(f(a j ))) 

(15) [<ri,ffj|eXi 

subject to f e [X, Y] 

The metric d on M has the standard definition: 

d(x,y) = inf{i( 7 )| 7 : 1 -► M >7 G C\ 7 (0) = z l7 (l) = y} 

and where £( 7 ) is the length of the curve 7 defined by the Riemannian metric g. 

5.2. Example: Mapping to a circle. Let M = R/2irZ be the circle, and suppose that Y is an n-polygon 
homeomorphic to M. We define the coordinate of the fc-th vertex to be 2nk/n, where k = 0, ...,n — 1. We 
also define ip to map a chain in Cq(Y) to the weighted sum of its basis elements. So we have that 



ipC^CiO-i) = ^2c t ip(o- l ) 
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Our distance function on M becomes d(x, y) = (y — x) mod 2n. Given this setup, we can find M-valued 



coordinates for a data set of interest, X by solving the optimization problem (15). An example is shown in 
Figure [5] 

5.3. Density Maximization. Suppose that we have a set of points in Euclidean space, Y$ £ M. n and a 
simplicial model X. We wish to find a mapping from the vertices in X to Euclidean space such that the 
images of these vertices land in regions of high density. The interpretation is that this process produces a 
clustering of the data that is aware of the topological structure of the data set Y . 
To do this, we begin with to pieces of data: 

A filtered simplicial complex, Y, constructed from the vertices Yq £ M. n . For example, one may use 
the Vietoris-Rips or witness constructions discussed in |Car09j . 

An emperical density estimator on K™, which we call f(y) created from the data points Yq. A 
common choice would be a kernel density estimate defined by 



• 



»=i v 7 



A reasonable choice for the kernel function K would be the standard Gaussian density function. 

Given a chain map g : C*(X) —> C*(Y), if a £ C (X), then we interpret the image g(<r) £ C (Y) to 
be the weighted average of the points in the chain. In other words, define i^(Y^ c j T j) = *51 c jV{ T j)i where 
(p : Yq — ► R" takes a vertex in Y to its Euclidean coordinates. Since we wish to move points in the image of 
the chain map to regions of high-density, we form the following optimization problem 

maximize J^ f(tp(g(a))) 
(16) aex 

subject to g £ [X, Y] 

5.4. Mapper and Contractible Data Sets. Another extension of the idea of hom-based mappings is to 
shape matching or data fusion. The discussion of gene expression data in the introduction provides the 
motivation for this. Suppose that we have two data sets X and Y a which for now are just sets of points. If 
these data sets arise from sampling a null-homotopic space, then Proposition [2] tells us that the homological 
mapping technique described in the previous section will not be very helpful. However, there is a way to 
remedy this. 



In the paper [SMCJ, the authors describe a multiscale decomposition method called mapper. The idea is 
that one has a filter function / : X — > E, and we cluster the set X according to preimages of overlapping 
intervals. Although we do not fully describe the method here, mapper allows a statistical practitioner to 
obtain a multiscale representation of the data at different resolutions. The reader is advised to consult SMC: 
for a detailed discussion. 

Given outputs of the mapper algorithm, we are interested in constructing maps between different rep- 
resentations of the same dataset, either from different filter functions or from different scale parameters. 
This problem is relevant in biological settings in which the obtained datasets are highly dependent on the 
measuring procedures used. We combine our homological mapping procedure with mapper into a structure 
mapping method as follows: 

• Given two data sets X and Y, and two filter functions fx,fy '■ X,Y —> R, we run the mapper 
algorithm to obtain reduced simplicial models Tx and Ty . For the 1-dimensional version of mapper 
on a contractible data set, Tx and Ty will be trees. 

• Since the filter functions fx and fy are also defined on the Tx and Ty, we take the quotient of each 
tree by the set of vertices which are local maxima of the filtration functions. This yields two graphs 
Gx and Gy. In general, these two graphs may have cycles. 

• We run the homological mapping algorithm to obtain an optimal chain map, g between C*(Gx) and 
C m (G Y ). 
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6. Implementation and Results 

6.1. Software. The above ideas were implemented as described below in a new version of the JavaPlex 
software package [TV J All . Further optimization and scripting was performed using Matlab. The com- 
putation of the homology of the hom-complex was performed over the held Q in exact arithmetic, and the 
optimization was performed in floating-point. 

We also note that this entire mapping procedure can be performed in a persistent setting where in addition 
to the natural grading of the chain complex by dimension, we have a grading by filtration. For a good 
discussion on persistent homology, the reader is invited to look at [Car09] and [ZC05 . The one modification 
we make is that we only select representatives for [X, Y] which correspond to nontrivial persistence intervals. 
In other words, in the expression [X, Y] = {J2 m bmfm + Yin c nh n }, we set the coefficients b m equal to 1 for 
significant intervals and for nonsignificant intervals. 

6.2. Visualization. The visualizations in this section show various examples of mappings between simplicial 
complexes. The domain complexes are on the left, and the codomains are on the right. Colors of the 
complexes are computed as follows: 

• The color of the domain complex is fixed. We start with map [i : Xq — >• [0, l] 3 mapping the vertices 
in X to their RGB values. The color of a n-simplex for n > is defined to be the average of the 
colors of its vertices. In other words n([vo, ■•■, v n ]) = (n + 1) _1 J^i M([ v »])- 

• To compute the color of a simplex r <E Y, under the map / : C*(X) — > C*(Y), we define n* : Y —¥ 
[0, l] 3 by /tt*(r) = A f (/*( r ))j where we extend /i linearly over chains in X, and where /* is the adjoint 
of /. This is analogous to the definition of the pushforward of a measure. 

6.3. Examples. In Figure |2j we show an example of a homotopy representative between a circle with 8 
vertices and a circle with 4 vertices. In order to compute the map, a random corner point of the admissible 
set, C, was selected. In other words, the map was a random extremal point of the polytope of maps minimizing 
the maximum row and column sums. The computed map is given below, where the block in the upper left 
corner is the map on the 0-skeleton and the block in the lower right corner is the map on the 1-skeleton: 



(17) 



In Figure [3] we show an example of a map computed by minimizing the Alexander- Whitney function with 
quadratic loss. In Figure [4l the figure on the left is a simplicial complex created using the lazy-witness 
construction from a sample of 500 points on a trefoil knot, jdSC04) . A map to a circle with 4 vertices is 
shown. In Figures [5] - [7j we explore the applications described in section [5J computing a map to a manifold, 
density maximization, and contractible datasets. We refer the reader to the captions for more information 
regarding these. 

7. Concluding Remarks 

In this paper we have discussed a method for computing maps between two simplicial complexes that 
respect their homological structure. The computation is done in a two-stage process: first a parameterization 
is obtained for the homotopy classes of chain maps, and then an optimization procedure is run to select one 
of the maps from the affine parameterization. We have also demonstrated the method on various examples. 
Some key distinguishing features in comparison with traditional statistical dimensionality reduction and 
mapping techniques include: 

• The domain and codomain data sets are not required to be Euclidean spaces, or even metric spaces. 

• Conventional linear and nonlinear dimensionality reduction methods rely on the fact that the data 
can be somehow unfolded into a convex subset of Euclidean space. The homological method presented 
is designed to preserve nontrivial topological structure. 
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• Unlike the method of circular coordinates, or various other surface mapping algorithms, in principle 
the method presented in this paper is not restricted by the dimension or structure of cither the 
domain or codomain spaces. 

Nevertheless, the mapping technique in its current state suffers from a few shortcomings: 

• There is no universal optimization problem which produces geometrically satisfying maps in all 
cases. Depending on the application or situation, a practitioner might want to use different objective 
functions or constraints. 

• Since the computation relies on the construction of the hom-complex, the fundamental problem 
size given simplicial complexes of sizes / and J is the product, IJ . This leads to somewhat poor 
algorithmic complexity in comparison with first order methods and has limited the sizes of the 
examples presented. 

A key step to improving the applicability of hom-complex based mappings would be to alleviate the 
problems with its algorithmic efficiency. It would be interesting to investigate what optimizations would 
enable this method to scale to datasets of more reasonable size. Despite these shortcomings, the examples 
in this paper are designed to be a proof-of-conccpt for hom-complex based mappings. 
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Figure 2. Simple example of a visualization of a chain map. The map was computed by 
selecting a random extremal point of the polytope C. An artefact of the visualization method 
described in section [672] is that the colors on the figure on the right are more intense than 
those on the left. This is due to the fact that the rows in the matrix in equation |17| sum to 
greater than 1. 
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Figure 3. This example shows an icosahedron being mapped to an octahedron. This map 
was constructed by performing the Alexander- Whitney optimization with quadratic loss. 
Note that the map was rescaled to prevent the colors in the codomain from being washed- 
out as in Figure [2] 
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Figure 4. The shape on the left was created by first randomly sampling 500 points on 
a trefoil knot. From this, the lazy-witness construction was used to construct a filtered 
simplicial complex as described in |dSC04| on a landmark set of 40 points constructed by 
sequential max-min selection. The mapping was obtained by randomly selecting 100 vertices 
of the polytope C, and then choosing the one which had greatest 2-norm to 1-norm ratio. 
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Figure 5. Manifold map example. The domain, X, consists of a simplicial circle with 60 
vertices, and the codomain, Y, consists of a circle embedded in the plane. Given a map 
/ : X — » Y, we may compute the embedded coordinates for the points in X as described 
in section |5.2| The object here is to find a map from X to Y that minimizes the total 
distortion across the 1-skeleton of the domain. On the left, the images of the domain points 
are shown as crosses, whereas the codomain points are shown as circles. On the right, we 
show the relationship between the original angular coordinates for points in the domain on 
the horizontal axis, versus the computed angular coordinates on the vertical axis. 
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Figure 6. Density maximization example. In the above figure, the codomain points consist 
of a random sample from the unit circle, and the domain complex is an idealized circle with 
10 vertices. The locations of the domain points are computed by selecting the homotopy 
representative that maximizes the density of the image points as described in section |5.3| 
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Figure 7. This figure shows the homological mapping algorithm applied to mapper outputs. 
The mapper outputs are quotiented out by maxima of the filtration function. For this 
example, we used an eccentricity filter. 



